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Abstract 

The spectrotemporal receptive field (STRF) provides a versatile and integrated, 
spectral and temporal, functional characterization of single cells in primary auditory 
cortex (AI). In this paper, we explore the origin of, and relationship between, different 
ways of measuring and analyzing an STRF. We demonstrate that STRFs measured 
using a spectrotemporally diverse array of broadband stimuli — such as dynamic rip- 
ples, spcctrotemporally white noise, and temporally orthogonal ripple combinations 
(TORCs) — are very similar, confirming earlier findings that the STRF is a robust lin- 
ear descriptor of the cell. We also present a new deterministic analysis framework that 
employs the Fourier series to describe the spectrotemporal modulations contained in 
the stimuli and responses. Additional insights into the STRF measurements, including 
the nature and interpretation of measurement errors, is presented using the Fourier 
transform, coupled to singular- value decomposition (SVD), and variability analyses in- 
cluding bootstrap. The results promote the utility of the STRF as a core functional 
descriptor of neurons in AI. 

Key Words: spectrotemporal receptive field, modulation transfer function, auditory 
cortex, ripple, variability, singular- value decomposition, ferret 
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1 Introduction 

It has been over twenty years since the spectrotemporal receptive field (STRF) was con- 
ceived to describe and measure audit ory neurons' jo i nt sensitivity to the spectral and tempo- 
ral dimensions of acoustical enerev dHermes et al. . 1981 : Aertsen and Johannesrnal . 1981bl : 



Smolders et al. . 19791 : Eggermont et al. . 198ll : Ijohannesma and Eggermon tl. Il983l). It 



was 



specifically associated with (1) stimuli characterized by randomly varying spectrotempo- 
ral features, and (2) an approach labeled reverse correlation, by which the neuron in- 
forms the experimenter, via action potentials, of the features that were of interest to it 
( de Boer and de Jonghl Il978[ Eggermont et al. . 1983bl ). The STRF offered a view of neu- 



ronal function that complemented, and was usually consistent with, that obtained using 
classical stimuli such as tones (tuning curves and rate-level functions), clicks (impulse re- 
sponses), and noise (bandwidth sensitivity). In addition, it nea tly fit within an analytical 
framework, bolstere d by the fields of time-frequency analysis ( Cohenl . Il995[ ) and nonlin- 



ear systems theory (|Eggermontl . Il993h . within which the functionality of neurons could, in 
principle, be systematically explored to any level of detail. 

The term "STRF" does not denote here the full complex (likely nonlinear) receptive 
field of an auditory neuron. Rather it is a technical term that has traditionally been used 
to refer specifically to the linear relationship between the time-dependent spike rate of a 
neuron and the time- and frequency- dependent energy — in short, the dynamic spectrum — 
of a stimulus. In order to measure the STRF, the reverse-correlation approach prescribes 
computing the average dynamic spectrum of those portions of a stimulus preceding the 
neuron's spikes. In this context, the STR F is co mmonly interpreted as the spectrotemporal 



pattern that optimally activates a neuron (|Young. 1998) . Theoretically, as long as all patterns 



occur randomly, in dependently, and e quiprobably, the STRF can be revealed by this "spike- 
triggered average" ( Eggermont . 1993| ). 



As with tuning curves, rate-level functions, and other commonly used neuronal response 
measures, the STRF provides only a limited view of the receptive field of a neuron, one 
that is useful only within the context of the experiment or the nature of information sought 
from it. For example, tuning curves are useful as approximate indicators of a units BF 
and bandwidth, but are largely irrelevant as a gauge of its dynamic range and temporal 
properties. Similarly, the STRF is a useful measure of spectrotemporal features likely to 
drive a cells responses. However, being a measure of the linear component of the stimulus- 
response relationship, it is mostly effective in predicting the linear aspects of the responses, 
predictions that can be accurate if the non-linear portions are small or are well known and 
can be accounted for in the measurement (e.g., spike-rate rectification and saturation). In 
some cases, the linear component of the response is small and hence one does not expect 
clean and reliable STRF measurements, i.e., the STRFs exhibit significant randomness or 
high variability across presentations, or are poor predictors of responses to novel stimuli. 
Examining these sources of variability and prediction errors provides useful information 
regarding the limitations of the STRF and ways to extend it beyond the linear domain. 

Although the STRF has been slow to mature, it is now increasingly used to study the 
physiology of central auditory neurons. In retrospect, the often slow pace of progress can 
be partially attributed to the reverse-correlation methodology, which remains fairly opaque. 
In particular, reverse correlation provides no straightforward formal basis for describing 
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the effectiveness of, or relations between, specific stimuh, because only the average statis- 
tics of stimuli are specified. For example, Gaussian broad-band noise, the "ideal" stim- 
ulus for r everse-correlation, is o ften ineffective when applied to central auditory neurons 
(but =ee \ K.«.r T.U.^..d B)). Meanwhile, a range of other stimuli and asso- 



ciated techniques have bee n auditioned, modulated broad-band noise (iMiller et al. . 2002 : 



Escabi and Schreinei j_ 20021). randorn sequences of tones or chords (lAertsen and Johannesma 



1981al: Epping and Eg germont , 19851: Schafer et al.L 19921: de Charms et al.L 
200d:^Rutko wski et a l.. 2002) , and natura l stimuli ('Aertsen and Jo hannesn 



1998^Theunissen et al 



Jo hannesmal . Il981al : lYeshurun et al. 



1987; .Schafer et al. 11992; .Theunissen et al.. . .2000i : .Sen et al.. . .2001 ). While it is sometimes 
implied that the auditory system processes different stimuli differently, it has not been made 
clear, because of the lack of vocabulary, to what extent different stimulation methods should 
yield different results. Additionally, most of the employed stimuli share randomness in their 
spectrotemporal design, in accordance w ith the reverse-correlation approach, b ut this style 



of stimulation is bound to be inefficient ()Victor and Knight , 

Because of these shortcomings, we endeavored to rec ord a deterministic and analyt- 



1979[lSutteil . ll992l ). 



ical reformulation of spectrotemporal reverse correlation ( Klein et al. . 2000l) . The roots 



of this new methodology are in the Fourier-based analvsis (|PapouliX 1962f ) of any given 
stimulus in terms of its spectrotemporal modulation frequency content. Each spectrotem- 
poral modulation frequency is the conjunction of a spectral and a temporal modulation 
frequency; the higher the spectral modulation frequency, the sharper the spectral feature 
(e.g., sharp peaks or edges in the spectrum), and the higher the temporal modulation fre- 
quency, the more abruptly that feature changes in time. As a population, the strongest 



tral and temporal modulation frequencies {. 


^,ees and MoUei. 1983 


: IShamma et al.. 


1995 


Schreiner and Calhoun. 1995; Kowalski et al.. 


1996al: Depireux et al. 


, 2001; 


Sen et al.. 


2001 


Miller et al. . I2OO2I: lEscabi and Schreineii. I2OO2I). Not surprisinglv. the most fruitful stimuli 



have had spectrotemporal modulation frequencies concentrated within this range. Our ap- 
proach extends these past successes by making explicit the relations between the spectrotem- 
poral modulation frequency content of a stimulus, the stimulus duration and bandwidth, and 
the accuracy of the STRF measurement. This enables the flexible design of diverse stim- 
uli that minimize both stimulation time and measurement error, within the constraints of 
a particular experiment. These constraints include information about not only the STRF, 
but also about the nonlinear and stochastic aspects of the stimulus-response transforma- 
tion, which are not directly described by the STRF. Another important advantage of this 
methodology is that it can be used to describe the mechanics of STRF measurement with 
any given stimulus, thus providing a language with which apparently disparate methods can 
be discussed. 

We focus in this article on three specific types of stimuli with increasing level of complex- 
ity, applied in primary a uditory cortex (Al) o f the a nesthetized fer r et. A t one extreme are the 



dynamic ripple stimuli (iKowalski et al 



I199 



2001[ ). which each consist 



Il996a. .b; Depireux et al. . 
of a single spectrotemporal modulation frequency. At the other extreme is spectrotemporally 
white noise (STWN), which contains many superimposed spectrotemporal modulation fre- 
quencies. Intermediate are temporally orthogonal ripple combinations (TORCs), consisting 
of special combinations of several spectrotemporal modulation frequencies each. We shall 
explore the relations between these stimuli, and compare the responses they evoke and the 
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resulting STRF measurements. Among the issues addressed are the similarity between the 
STRF measurements, their fidelity and noise-robustness, their susceptibility to common neu- 
ronal nonlinearities, and the expected amount of data necessary to achieve an measurement 
with a desired level of accuracy. The methods used to address these issues are quite general, 
though the specific findings apply only to the population of neurons in AI studied here. 



2 Methods 
2.1 Theory 

In this section, we outline the methodological basis of this study. Its key element is an 
analytical description of the stimulus-to-response transformation, in terms of the processing 
of spectrotemporal modulation frequencies. In this context, the result of reverse correlation 
is derived, first assuming that the response is deterministically and linearly related to the 
stimulus, and then considering the separate effects of response variability and nonlinearity. 
At the core of the STRF-based model of neural functionality is the following equation: 

r{t) = J J h{T, x) ■ s{t — T, x) dr dx, (1) 

where the neuronal response r at any time t is the linear integration of influences aris- 
ing from stimulus energy s at different tonotopic locations x (here corresponding to the 
logarithm of frequency) and different times in the past r. The strength and nature of the 
influences — whether they are excitatory (positive), or suppressive or inhibitory (negative) 
is described by the STRF as denoted by /i(r, x). In the context of reverse correlation, r(t) 



is typically taken to be the t ime-dependent s pike rate of a neuron ffEggermont et al.l . Il983 
Keller and Takahashil . hOOd ISen et al.l . Hool 



2.1.1 The Linear Processing of Spectrotemporal Modulation Frequencies 

Our analytical description of dynamic spectra is based upon the Fourier series ( Papouli^ . 



1962f ). using elemental Fourier components which are cosine waves as a function of both t 



and x: a ■ cos{27nut + 2'n'Qx + tp). The wave has a peak value of a and starting phase ip. The 
wave frequency is w cycles/second (Hz) along t and fl cycles/octave (cyc/oct) along x. Since 
the dynamic spectrum details the modulation of acoustic energy as a function of both x and 
t, these frequencies are referred to as modulation frequencies: spectral (f2) and temporal 
(w). A single Fourier component is said to consist of a single spectrotemporal modulation 
frequency, deflned by a speciflc {w, Q) pair. Just as a sum of pure tones of various frequencies, 
amplitudes, and phases can describe any acoustic waveform over a flnite duration, a sum of 
various spectrotemporal modulation frequencies (with appropriate amplitudes and phases) 
can describe any dynamic spectrum over a flnite duration T and bandwidth X. Further, 
just as the frequency content of an acoustic waveform (i.e., the amplitudes and phases of its 
constituent tones) is described by its (Fourier) spectrum, the spectrotemporal modulation 
frequency content of a dynamic spectrum is described by its spectrotemporal modulation 
spectrum S. 

When the STRF is recast as operating upon S, one arrives at a complementary descrip- 
tion called the spectrotemporal modulation transfer function H. H[w,Q], which is the 2-D 
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Fourier transform of the STRF h{t,x), details the linear component of neural processing 
of spectrotemporal mod ulation frequencies. Su ch processing is a l ready under study in au- 
ditory neurophysiology (i Kowal ski et al.l . Il996al )b[ Depireux et al. , 2001 , Miller et al. , 2001 



2OO2I : Escabi and Schreiner. . .20o3 ) and psychoacoustics ( Chi et al. . Il999(l. and is also being 



inyes t igated for yarious s ignal-processing tasks, inc luding audio coding (lAtlas and Shamma 



2OO3I: iKlein et al.l. l2003t) a nd speech recognition (iHermmanskyl . Il999l : iNadeu et all I2OOII 
Kleinschmidt and Gelbaxil . I2OO2I: iKleinschmidil . l2002[) . 

S and H are mathematically defined as follows. Consider a dynamic spectrum s{t,x) 
and an STRF h{t,x), both giyen oyer a finite range of T seconds and X octayes. Using the 
exponential form of the Fourier series, s can be expressed by the sum 



s(t, x) 



00 

E 

k=—oci I- 



00 

(a[wk,^i]e^ 



ji>[wj,,ni]\ j2Tr{wkt+nix) 



(2) 



where e is the base of the natural logarithm, j = a/— 1, k and / are integers, Wk = k/T, and 
Qi = l/X. This is perhaps the simplest form of the Fourier series to use; ironically it employs 
"complex" exponential functions. These functions are related to the real-yalued Fourier 
components through the trigonometric identity cos(0) = ^ (^e^'^ + e~^'^^ , etc. Accordingly, 
each term in this sum, indexed by k and I, has a complex-conjugate counterpart, indexed 
by —k and — /, such that a[wk,fii] = a[w-k,fi-i] and ilj[wk,^i] = —ip[w-k,^-i]. Henceforth 
we will simplify the notation by dropping the k and I subscripts, howeyer keeping in mind 
that w and Q are discrete- yalued yariables (as indicated by the square brackets). Thus, 
the amplitudes and phases of the modulation-frequency components are giyen by a[w,fi] 
and which together form = a[w,il]e^'^^^'^\ As for the STRF, its Fourier 

series description can be represented by the same sinusoidal components, but with different 
amplitudes b[w, f2] and phases 9[w, il], which together form H[w, fi] = b[w, il\e^^^'^'^\ As we'll 
see, b generally describes describes the strength of the response to particular spectrotemporal 
modulation frequencies, while 6 describes the timing of the response. 

In practice, s(t, x) is represented on a computer by discrete samples, s[tk, xi] = s{kAt, lA^), 
taken at a rate of 1/At samples/second and l/A^ samples/octaye, where k and I are inte- 
gers. Again, we will drop the k and / subscripts, howeyer kee ping in mind that t and 
X are now discrete- yalued yariables. By the sampling theorem flOppenheim and Schafer . 
1989> ). this assumes that S is sufficiently smooth; that is, it can be described by a lim- 
ited number of temporal and spectral modulation frequencies no higher than 1/(2 A^) and 
l/{2Ax), respectiyely. Within these limits, ^[w,!]] is then obtained by computing the Dis- 
crete Fourier Tr ansform (DFT) of s\t,x] (using the Fast Fourier Transform, or FFT, algo- 
rithm) ([Oppenh eim and Schafer . [l989). Analogously, H[w,D] is obtainable yia the (Dis- 
crete) Fourier Transform of the STRF h[t,x]. 

Since the response, r{t), depends only on time, its Fourier-series description utilizes 
only temporal modulation frequencies. It can be deriyed by inserting the Fourier-series 
descriptions of s and h into Eq. and carrying out the integration. The result is that the 
Fourier Transform of the sampled response r[t] has the form 



R[w] = H[w, -n] ■ S[w, ^] = Y ^] ■ ^[^^ 



(3) 



Recall that in Eq. the response was obtained by integrating oyer the spectral axis (x) 
after temporally conyolying the dynamic spectrum with the STRF; here, the conyolution is 
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realized via the multiplication of Fourier Transforms^ ( Oppenheim and Schafer . 1989[ l. and 



the integration over x is replaced by a summation over Q . Therefore, each frequency w in 
the response results from all spectrotemporal modulation frequencies in the stimulus sharing 
the same temporal component w. 

2.1.2 Fourier-based Reformulation of Spectrotemporal Reverse Correlation 

The STRF was, in Section 12.1.11 recast in terms of the processing of spectrotemporal mod- 
ulation frequencies. The result of spectrotemporal reverse correlation will now be derived in 
this context. 

If spike times are quantized, and stimuli are sampled, with a temporal resolution At, 
then the average stimulus preceding a neuron's spikes is proportional to the temporal cross- 
correlation between the stimulus and a "binned spike trai n" response, y\t] , consisting of 
the number of spikes observed in consecutive At intervals (jEggermont et al .. 19831]). For 
now, we assume that y[t]/At, with units of spike rate (spikes/second), is equal to r[t] (the 
sampled STRF-based response), whose Fourier Transform R[w] was derived in Eq. Q. 
Cross-correlation is a linear operation an d, much like convolution, it ca n be realized via 
the multiplication of Fourier Transforms^ f Oppenheim and Schafer . 19891 ). This takes the 



following form, in the case of spectrotemporal reverse correlation: 

R[w]- s*[w,-n] = H[w,n]-\s[w,-n]\'^ 

+ H[w,Q']S[w,-Q']S*[w,-Q] 

= H[w,n]ia[w,~n]y + i[w,n], (4) 

where * denotes complex conjugation and [^[ly,^]! = \JS[w, ^7] ■ S*[w, fi] = a[?i;,ri] is the 
magnitude of S. Eq. represents the Fourier Transform of the reverse correlation result. 

An important special case exists when 1 5*1 is flat {a[w, Q] = a) over the extent of H that 
is nonzero, and further e[iy,r2] = 0. Then, Eq. is proportional to the H, with 

R[w]-S*[w,-n] 
H[w,ll\ = — , (5) 

Since /i[t,a:] is, by definition, the inverse Fourier Transform of H[w,fl], this implies that, in 
this special case, reverse correlation will yield a result proportional to the STRF. 

This desirable result has immediate implications for effective stimulus design. That 
the spectrotemporal modulation spectrum should be flat equivalently requires the stimulus 
contain in equal strength all spectrotemporal modulation frequencies needed to construct 
H. If the stimulus contains a subset of the necessary modulation frequencies, then only 
part of H can be constructed: H will be filtered. The e = requirement is not so simply 
related. This is a systematic stimulus-induced error, dependent upon temporal correlations 
between different spectrotemporal modulation frequencies in the stimulus (it may also be 

^Strictly speaking, this implements a circular convolution. If the stimulus is not periodic, this 
can be converted to a linear convolution by including zeros (silence) before and after the stimulus 
ijOppenheim and Schafeilll989l) . 

^Modulo the previous note concerning circular convolution 
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framed in t erms of temporal correlations between the s timulus energy at different tonotopic 
locations) ([Klein et al. . 20001 : Theunissen et al. . 2000l ). It will be nonzero if the stimulus 
contains multiple spectrotemporal modulation frequencies that share the same value of \w\, 
and therefore by Eq. Q evoke the same frequency in the response. For a general stimulus, e 
will not be zero, or even small, and therefore one of three methods must be used to eliminate 
or reduce its effects: First, if stimuli are sufficiently diverse over time or over multiple 
stimuli, then e a symptotically appr oaches zero as the stimulus duration or the number of 
stimuli increases fel ein et all l200ql) : second, specially designed stimuli may be employed for 
which e is zero f Kvale et al.l . 19981 : Klein et al. . 2000[ ): and third, add itional computations 



may be undertaken to try and adjust for the correlations in the stimulus ( Aertsen et al. . 1980l: 



Aertsen and Johannesma . 1981a : Theunissen et al. . 2000l l. In this article, we concentrate on 
the first two of these methods. 

Given some knowledge about H, creative stimulus design is facilitated by the simple 
relationship of Eq. (0) between the measurement of points in H and the corresponding 
points in the spectrotemporal niodulation spectrum. F or example, suppose H is quadrant- 
separable ( Kowalski et al.l 1996bl : Depireux et al. . 2001), i.e., within each quadrant, the value 
at every point is the product of a single vertical cross-section with a single horizontal cross- 
section. Then, using only stimuli from a single vertical cross-section and a single horizontal 
cross-section within each quadrant is sufficient to measure the entire H. As discussed below, 
the assumption of quadrant separability is made for STRFs measured using one stimulus 
set (dynamic ripples). Note that the same measurements could be made using differently 
structured stimuli that directly probe all points of H. The extent that measured STRFs 
agree across stimulus sets measures linearity; but the extent that STRFs measured using 
dynamic ripples disagree with the other measured STRFs, does not distinguish between lack 
of linearity and lack of quadrant separability. 

Thus far, we have assumed that the response is deterministically and linearly related to 
the dynamic spectrum. In the next two sections, we relax these assumptions and consider 
how response variability and nonlinearity effects the real-world results. Accordingly, Eq. 
(0) is henceforth treated as a measurement of H (and subsequently the STRF), using an 
observed response that is not necessarily fully described by the STRF. 



2.1.3 Reliability of the STRF Measurement 

We have assumed thus far that the transformation from stimulus to response is deterministic. 
However, i n response to identical stimulu s presentations, neuronal responses exhibit inherent 
variability ( Shadlen and Newsomd . 1998[ l , and so the result of reverse correlation is somewhat 
indeterminate. Therefore, Eq. (j^ should be interpreted as the mean result, which would be 
obtained by averaging the results of an infinite number of identical experiments. Due to the 
linearity of reverse correlation, this is also the result obtained if r[t] is taken to be the mean 
of y[t]/At (the mean time-dependent spike rate). 

This mean result is called the signal. The difference between the actual measurement 
and its mean is called noise. The exact form of the noise varies from measurement to 
measurement. The mean squared-magnitude of the noise, as a function of t and x, is called 
the variance of the measurement (the square of the standard error). The overall reliability 
of the measurement can be gauged from the signal-to-noise ratio, SNR = P/ {a^}, which 
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is the average power (squared-magnitude) of the signal (P) relative to the average variance 
of the noise ((o"^)), where the averages are perfor med over all t and x. Note that both P 
and (o"^) are preserved by the Fourier Transform ( Papouli^ . Il962l : lOppenheim and Schafer . 



Il989[ ). and therefore the SNR of h[t,x] is identical to that of H[w,fl\ (with the averages 
performed over w and fl). 

With this in mind, the signal and noise components of the SNR can be directly traced 
through Eq. (jS)) to the response. The variance of H is found to be 

Var a]} = y-r{Rl.]}\Si.,-a]l^ ^ Var {R^} ^ 

Oj a 

since R[w] is the only source of variance. 

Analogously, the squared-magnitude (power) of H is 

l„l,,,n]f = \^. (7) 

If r is taken to be the mean response, this equation describes the signal power. If instead r 
denotes the actual response, then the resulting H measurement (and equivalently, the STRF 
measurement) will be composed of signal plus noise, and therefore its average power will 
exceed P by (cx^), provided the signal and noise components are uncorrelated. 

In summary, response variability is a source of error in the STRF measurement. This is 
referred to as non-systematic error, since its exact form varies from measurement to mea- 
surement. The expected size of the error is quantified by (cr^). At the same time, the signal 
power (P) and response power are closely related. Therefore, stimuli that maximize the 
response power relative to the response variance will result in more reliable STRF measure- 
ments (higher SNR). Note also that, in theory, the SNR of the STRF measurement could be 
obtained directly from the response, without actually computing the STRF. 

2.1.4 Nonlinear Contributions 

So far, we have only discussed the relationship between modulations in the dynamic spectrum 
and modulations of the mean spike rate as being purely linear. Of course nonlinearities 
such as rectification (the strictly posi t ive na ture of the spike rate) and synaptic depression 
( Chance et al. . 19981 : Carandini et al. . 2002| ) introduce additional response components. To 



the extent that these components are correlated with the stimulus, they result in systematic, 
stimulus- dependent errors to the STRF measurement. 

A detailed accounting for various nonlinearities is not given here. Suffice it to say that 
a portion of the response can be described by Eq. (0), and the remaining nonlinear por- 
tion may be described by additional term s in a Volterra or Wiener functional expansion . 
which have long been used in neuroscience ( Eggermontl . Il993[ ) and systems theory (Schetzen 



The portion of the nonlinearity manifest at the odd- and even-numbered terms of 



the expansions is dubbed odd- and even-order nonlinearity, respectively. Fourier-based de- 
scriptions of the input-output characteristics of such systems are alrea dy well studied (e.g. 



scriptions oi tne mput-output cnaracteristics oi sucn systems are alrea dy well studied [e.g., 
( Victor and Knighn . 19791 : Victor and Shaplev . 1980l: Bovd et allll983l )). They describe how 



multiple stimulus frequencies (e.g., spectrotemporal modulation frequencies) interact to form 
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nonlinear response frequencies, or distortion products. It is those distortion products mani- 
fested at frequencies overlapping with the linear portion of the response that interfere with 
the STRF measurement. 

Knowledge about the stimulus dependence of distortion produ cts facilitates the detec- 



tion, identification, and extract i on of nonlinear re sponse elements fISpekreijse and Posting 



19701: IVictor and Shaple^1 ■ iBovd et al.l . llQSsh . For example, odd- and even-order non- 



linearities are distinct in that their distortion products are composed of products of odd and 
even numbers of stimulus elements, respectively. By straightforward trigonometry, one can 
determine the possible response frequencies that may be observed for a stimulus of known (or 
cleverly designed) composition, and further determine how the amplitude of these distortion 
products will change if a gain is applied to the stimulus. 

2.2 Experimental Details 

We now detail how the above methodology is exploited by the methods used in this study. 
2.2.1 Surgery and animal preparation 

Data were collected from 16 domestic ferrets {Mustela putorius) supplied by Marshall Farms 
(Rochester, NY). The ferrets were anesthetized with sodium pentobarbital (40 mg/kg) and 
maintained under deep anesthesia during the surgery. Once the recording session started, a 
combination of Ketamine (8 mg/Kg/Hr), Xylazine (1.6 mg/Kg/Hr), Atropine (10 /ig/Kg/Hr) 
and Dexamethasone (40 /xg/Kg/Hr) was given throughout the experiment by continuous 
intravenous infusion, together with Dextrose, 5% in Ringer solution, at a rate of 1 cc/Kg/Hr, 
to maintain metabolic stability. The ectosylvian gyrus, which includes the primary auditory 
cortex, was exposed by craniotomy and the dura was reflected. The contralateral ear canal 
was exposed and partly resected, and a cone-shaped speculum containing a miniature speaker 
(Sony MDR-E464) wa s sutured to the meatal stump. For more details on the surgery see 
( Shamma etHl Il993t) . 



2.2.2 Recordings, spike sorting, and selection criteria 

Action potentials from single units were recorded using glass-insulated tungsten microelec- 
trodes with 5-7 Mfi tip impedance at 1 kHz. In each animal, electrode penetrations were 
made orthogonal to the cortical surface. In each penetration, cells were typically i solate d at 
depths of 350-600 yum corresponding to cortical layers III and IV (|Shamma et al.Lll993h . In 



12 animals, neural signals were fed through a window discriminator and the time of spike 
occurrence relative to stimulus delivery was stored using a computer. In the other 4 animals, 
the neural signals were stored for further processing offline. Using MATLAB software de- 
signed in-house, action potentials were then manually classified as belonging to one or more 
distinct neurons, and the spike times for each neuron were recorded. The action potentials 
assigned to a single neuron met the following criteria: (1) the peaks of the spike waveforms 
exceeded 4 times the standard deviation of the entire recording; (2) each spike waveform 
was less than 2 ms in duration and consisted of a clear positive deflection followed immedi- 
ately by a negative deflection; (3) the spike waveforms were not visibly different from each 
other, modulo the noise; (4) the histogram of inter-spike-intervals evidenced a minimum time 
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between spikes (refractory period) of at least 1 ms. This procedure occasionally produced 
units with very low spike counts. After consulting the distribution of spike counts for all 
units, units that fired fewer than one spike per two seconds of stimulation were excluded 
from further analysis. 



Analysis of the dynamic-ripple recordings was published previously (jPepireux et al 



20011 ). Here we used the same selection criteria for those recordings that were used in 
that study. Those criteria were somewhat more stringent than those used for the TORC and 
STWN recordings; consequently, there are conspicuously fewer instances of low-SNR STRFs 
and low spike counts in the dynamic-ripple results, with respect to the TORC and STWN 
results. 

2.2.3 Stimulus Realization and Delivery 

A stimulus is designed by first specifying its envelope S. Recall from Section 12.1.21 that the 
spectrotemporal modulation frequencies contained in the stimulus are used to reconstruct 
the STRF. Through the properties of the Fourier Series described in Section 12.1.11 the set 
of frequencies required for this construction is defined by four parameters: T and X, the 
temporal extent (memory) and spectral extent (bandwidth) of STRF; and Wc and flc, the 
maximum temporal and spectral modulation frequencies in H. For all results reported here, 
T was 250 ms, X was 5 octaves, Wc was 24 Hz, and Qc was 1.4 cyc/oct. These values were 
chosen a priori based upon th e l ikely structure of STR Fs in AI, as inferred from previous 



studies (jKowalski et al.L Il996al )bl: iDepireux et al.L 120011 ) . 



The requisite set of modulation frequencies need not be contained within a single stimu- 
lus; it may be divided among multiple stimuli. Stimuli thus devised are used to independently 
reconstruct different areas of H, which are finally combined to form the complete measure- 
ment. Some benefits of this scheme ir iclude the r e ducti on of measurement errors and the 
option of using short-duration stimuli ( Klein et al. . 2000l ). 



The design of S subsequently specifies (via an inverse Fourier Transform) a desired or 
"target" dynamic spectrum. We realized this target with a sum of amplitude-modulated 
(AM) tones of various c arrier frequencies (typically 100 tones per octave) and random phases 



( Kowalski et al.L Il996a[ ). First, the target is scaled so that its values lie within ±90% of the 
mean value. The mean value, which corresponds to the mean amplitude of the tones, is 
set 10-20 dB above the neuron's threshold (measured previously with pure tones). Finally, 
the AM pattern of each tone is specified by the cross-section of the envelope S at the 
corresponding spectral location x. 

Three types of stimuli are used in this study: dynamic-ripple stimuli, temporally orthog- 
onal ripple combinations (TORCs), and spectrotemporally white noise (STWN). As exem- 
plified in Figure ^ they distribute spectrotemporal modulation frequencies among stimuli 
in different ways. Due to the peak-amplitude constraint on the dynamic spectra, they also 
employ markedly different modulation-frequency amplitudes; increasing the number of mod- 
ulation frequencies in a stimulus (implying more complex modulations) generally requires the 
amplitude of each frequency to be decreased so that their sum is contained within a given 
range. In any case, the amplitudes of all modulation frequencies within a given stimulus 
were identical. If a stimulus contained multiple modulation frequencies, their phases were 
randomly assigned; otherwise they were (arbitrarily) set to zero. Additional details about 
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these stimuli are provided later in Section 13.11 



A'' 

Ripple (17.2) 




Figure 1: The S magnitudes are illustrated for members of each of the three stimulus types — dynamic- 
ripple stimuli, TORCs, and STWN. The stimuli all have the same duration (250 ms), and contain 1, 6, and 
90 spectrotemporal modulation frequencies, respectively. By virtue of the dynamic range constraint on the 
intensities of the dynamic spectrum, the stimuli must employ different modulation-frequency amplitudes a. 
The amplitudes, relative to those of the STWN stimulus, are indicated in parentheses. 



The Fourier series endows dynamic spectra, thus designed, with a common periodicity 
of T = 250 ms and X = 5 octaves. One spectral period was realized in each stimulus, 
whose 5-octave bandwidth was centered upon the neuron's pure-tone tuning curve (measured 
previously). The temporal periodicity of the dynamic spectra was exploited; this enabled 
multiple observations of the response, since (assuming the neuron's memory is less than T 
seconds) all temporal periods beyond the first constitute identical stimulus presentations. A 
stimulus sweep consisted of a limited number (4 or 12) of stimulus periods, and had a rise 
and fall time of 8 ms. Multiple sweeps were presented for each stimulus. Sweeps of different 
stimuli, separated by 3-4 seconds of silence, were presented in a pseudorandom order, until 
a neuron was exposed 60-120 periods (15-30 s) of each stimulus. 

All stimuli were gated and fed through an equalizer into an earphone. Calibration of the 
sound delivery system (to obtain a flat frequency response up to 20 kHz) was performed in 
situ with the use of a 1/8 in. Briiel & Kjaer 4170 probe microphone. The earphone was 
inserted into the ear canal through the wall of the speculum to within 5 mm of the tympanic 
membrane. T he speculum and microphone setup resembles closely that suggested by Evans 
( Evan.4 Fm7i. 



2.2.4 Response Measurement and STRF Calculation 

Each stimulus resulted in a collection of response observations y[t] (i.e., binned spike trains), 
each member of which consisted of the number of spikes occurring in successive = 1 ms 
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intervals during one stimulus period (see, e.g., Figure |2f3). The total number of stimulus 
periods used was n. The transient epochs, during the first period of each sweep, were 
disregarded; only the steady-state portion of the response was utilized. The spike rate r[t] 

n 

was then estimated from the sample mean of y[t]/At: r[t] = - J2 yi{t]/At, where yi[t] is the 

1=1 

response to the i^^ stimulus period. This is the response whose Fourier Transform is used 
to calculate H (and subsequently the STRF), or some portion thereof, via Eq. These 
calculations are very simple and are completed in MATLAB (Mathworks) in a fraction of a 
second. 



2.2.5 Reducing Nonlinear Interference with the Inverse- Repeat Method 



In this article, we concentrate on even-order nonlinearities; they are ubiquitous in the brain 
(e.g., due to rectification), and can se verely d i stort the reverse-correlation measurement, 
particularly when the stimulus is brief ( Swerupl . 197^). Fortunately, its ill effects are easily 
isolat ed and extracted by the inverse-repeat method (^ MoUerL TlQ??! : IWickesberg and Geisler . 
1984^ . In its simplest form, this method calls for two stimuli (here, dynamic spectra) that 
sum to a constant value. While the linear responses to the two stimuli are opp osite in sign. 



the even-ordered distortion products are identical (jVictor and Shaplevl . 119801 ). Therefore, 
the even-order effects are removed by subtracting the two responses and dividing by two (or 
instead isolated by adding the responses). This method is investigated in conjunction with 
TORC stimulation. 



2.2.6 Signal and Noise Calculations: Non-systematic Errors 

As mentioned in Section [2. 1.31 the measures of signal power P and noise variance (cr^), and 
therefore the SNR, apply to both h[t,x] and H[w,fl\. For a single stimulus-response pair, 
a simple relationship was identified in Eq. © between the variance of H[w,Q] and the 
variance of R[w]. Note that latter variance is, in turn, proportional to the variance of y[w], 
the Fourier Transform of the response to one stimulus period; specifically, 

r^r n lVar\y\w]} 
Var{R[w]} = - (8) 

Thus, the variance of H[w,VL\ could be quickly estimated from the sample variance of y[w\ 
(across all stimulus periods), without repeating the experiment or subdividing the data. 

However, the H measurement may incorporate the measurements from multiple stimulus- 
response pairs; if so, its variance will depend on how the individual measurements are com- 
bined. If a point on H[w, VL\ is the average of N measurements, then its variance will simply 
tend to scale by 1/A^ with respect to that of an individual measurement. But more com- 
plicated functions of the indivi dual measurements (such as that used for the dynamic-ripple 
stimuli ( Depireux et al.l . 200l[ )) may obscure the relation between the variance of H and 



that of the constituent responses. In such a case, the bootstrap method may be employed. 
This method simulates the randomness of a statistic that is a function of a collection of 
i dentical observations, without repeating the experiment or subdividing the observations 
( Efron and Tibshirani . 19931 : Politisl . 1998| ). In the present context, a new H is computed 



from a new, identical-sized collection of y\i\, assembled by selecting members of the original 
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collection randomly and with replacement. The sample variance of H, or some function 
thereof, is calculated after repeating the process many times (we used 300), which is feasible 
due to the simplicity of the computations. 

For the sake of equal footing, we used the bootstrap method to estimate the variance of 
H for all stimulus types. After subsequently calculating (u^), the SNR was inferred from 
the average power of H, which, as mentioned in Section !^. 1.31 approximately equals P+ (a^). 



2.2.7 Signal and Noise Calculations: Systematic Measurement Errors 

The SNR quantifies the size of the signal compared to the size of the non-systematic compo- 
nent of the measurement error. However, the possible additional contribution of systematic 
errors — that is, those induced by non-ideal stimulus structure (i.e., e in Eq. and by non- 
linearities — cause the actual error level of the STRF measurement to exceed that described 
by the SNR. There exists an opportunity to obtain a more "correct" measure of the SNR, 
provided that all errors are evenly distributed over the STRF measurement, because the 
signal tends to be concentrated in an early region of the STRF measurement between and 
125 ms - in other words, neuron's responses are only weakly effected by stimulus conditions 
more thanl25 ms in the past. Accordingly, a corrected SNR measure, SNRcor, was obtained 
after dividing the average power of the early region of the STRF measurement by the average 
power of the late (post 125 ms) region. Note that the late region of the STRF measurement 
contains the uncorrelated contributions of both non-systematic and systematic errors, while 
the noise power estimate used for SNR only measures the non-systematic component; there- 
fore, SNRcor should be less than or equal to SNR (modulo the inaccuracies in measuring 
SNR and SNRcor), with equality when there are no systematic errors. 



2.2.8 Error Reduction with the Singular- Value Decomposition 



To further reduce errors in the STRF measurement, we investigated the singular- value de- 
composition (SVD), applied to either h[t,x] or H[w,Q] (which are both just matrices of 
numbers). The SVD is a wel l -studied tool for reso lving the structure of matrices that are 
corrupted by errors (jstewart, 19931 : iHansenl . 1998[ l. It works by breaking up an arbitrary 
matrix into a sum of separable matrices, which, in the current context, are each formed 
by the product of one temporal vector and one spectral vector. The first matrix takes the 
best separable approximation out of the original matrix; the second takes the best separable 
approximation out of the remainder, and so on. The importance of each separable matrix is 
gauged by its singular value, which is the square root of its average power. The total number 
of separable matrices required to describe a matrix (the number of nonzero singular values) 
is called the matrix's rank. 



A basic theorem (jStewartl . Il99ll ) implies that if the error-free STRF can be well approx- 
imated by only a few separable matrices, then the addition of small and evenly distributed 
errors will only slightly perturb their form, as they constitute the first few matrices in the 
SVD of the STRF measurement. The additional and subsequent matrices required to de- 
scribe the measurement will describe mostly errors, and thus should be discarded. In fact, 
there are a priori reasons to believe that STRFs are well approximated by low-rank matrices. 
Typically, cortical STRFs are locali zed in a compac t area of the spectrotempo ral domain and 
the modulation-frequency domain ( Depireux et al. . 2001 : Miller et al. . 2002| ): this alone will 
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limit their rank. Still lower limits will be impose d by special structure within the STRF 
or the H, such as sp ectral-temporal separabi lity (lEggermont et al. . 1981 : Depireux et al. . 
200l|_Sen et al. . 2001^). quadrant separability ( Depireux et all 20011 ) . and temporal symme- 



try (jSimon et all Isubnj ) 



II I practice, determining which separable matrices should be discarded is a complex prob- 
lem ( Stewart . 19931 : Hansen . 1998[ ). Most approaches use knowledge or assumptions about 
the size and structure of the errors to bound the singular values (or functions thereof) of those 
separable matrices describing mostly errors. Through simulations, we found that methods 
based solely on variability analysis tended to underestimate the size of the errors; instead, 
the most generally accurate methods gauged the error leve l directly from t he post-125-ms 
region of the STRF measurement (for a similar method see ()Sen et al. . 200l[ )). We used the 
largest singular value from this region (or its Fourier Transform) to threshold the singular 
values of the pre-125-ms region (or its Fourier Transform). In theory, the STRF (or H) is 
optimally approximated using only those separabl e matric es with singu lar values above this 
threshold, and discarding the remainder ( Steward . il993: Hansen . 1993). 

Although this approximation is in some sense optimal, it is still prone to error. As the 
error level increases, more and more error leaks into the approxir nation and, co nversely, more 
and more of the STRF power is lost under the error threshold ( Hansenl . Il998l ). This second 
case is of primary interest in this study; we will gauge the proportion of (error-free) STRF 
power excluded from the SVD approximation. A naive gauge of this i s asvn, the proportio n 
of the STRF measurement's power contained in the SVD remainder ( Depireux et al. 112 001 ). 
Unfortunately, when the level of measurement error is high, asvD itself will be inflated, 
because much of the remainder will consist of error. However, we can use the bootstrap 
method to estimate the size (average variance) of the part of the remainder resulting from 
non-systematic errors, and subtract it out. This leads to a more accurate gauge of the 
proportion of lost STRF power, particularly when the systematic errors are small: Psvd, 
the average power of the systematic component of the remainder, divided by P. In Section 
13.41 we use asvD and Psvd together to study how measurement errors effect the performance 
of the SVD. 



2.2.9 STRF Comparisons 

In this article, the correlation coefficient is used to quantify the similarity between two 
different STRF measurements. This takes values between —1 and -|-1, with -|-1 indicating 
a perfect match. Comparisons are made over the first 125 ms of the measurements, both 
before and after the SVD is applied. Note that the correlation coefficients for the pre-SVD 
comparisons will be limited by SNRcor', if two identical STRFs are corrupted by independent 
and identically distributed errors, the correlation coefficient should approximately equal 
SNRcorl (SNRcor + l)- To the extent that the SVD approximations result in increased SNRs, 
they will allow for higher correlation coefficients, which we modeled as gSNRcor/ {gSNRcor + 
1), where g represents a multiplicative gain in SNRcor- 



2.2.10 Simulations 

Simulations were employed in order to verify the performance of these methods under realistic 
conditions. The core of a simulation is an STRF (tailor-made or derived from a low-rank 
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approximation of an actual measurement) and a set of stimuli. The STRF-based responses 
to the stimuli are computed via Eq. Q. These responses are then altered; usually they 
are rectified and then subjected to another static nonlinearity, such as a squaring function. 
The result, representing the tim e-varving spike rate, is used to creat e spik e trains with 
inhomogeneous Poisson statistics ( Berry and Meistei . 19981 : Oram et al. . 1999| ). with a time 
step of 50 fis. These spike trains are treated as the responses of a neuron with an unknown 
STRF, and are subjected to the very same analyses as the real responses. Wherever the 
bootstrap method was employed, its expected performance was simulated against a Monte- 
Carlo procedure, employing 300 sets of independent responses with identical spike rates. 



3 Results 

The results of this study are presented as follows. In Section ITTl we detail the measurement 
of a neuron's STRF using each of the three stimulation types, and we subsequently illustrate 
the computation of the SVD-based STRF approximations. In Section for neurons whose 
STRFs were measured with multiple stimulus types, we examine the similarity between the 
multiple measurements and the corresponding SVD approximations, as a function of the 
level of measurement error. In Section ESI we analyze the origins and stimulus dependence 
of the measurement errors. Finally, in Section 13. 4^ we study how measurement errors affect 
the sufficiency of the SVD approximations. 



3.1 Overview 



In this section, we detail the measurement of a neuron's STRF using dynamic-ripple stimuli 
(Figure EK), TORCs (Figure EB), and STWN (Figure ET), respectively. The S magnitudes 
for examples of each of these stimulus types are illustrated in Figure ^ The respective 
STRF measurements are denoted STRF^j^^, STRFTOi?c, and STRFs^vKAf- Computation of 
the SVD-based approximations of the measurements is subsequently detailed. 



3.1.1 Dynamic-Ripple Stimuli 



For the dynamic-ripple stimuli (jKowalski et al.l . Il996al : iDepireux et al.l . l200lh shown in Fig- 



ure El^, each stimulus is composed of a single spectrotemporal modulation frequency (Fourier 
component). It can therefore be considered the a uditory equivalent to the drifting sinusoidal 
luminance gratings used in visual neuroscience (|Valois and Valoisl . ll99ol ). Figure El^ shows 
the dynamic spectrum of one such stimulus (top panel), which has a temporal modulation 
rate w of —8 Hz and a spectral modulation rate Q of 0.2 cyc/oct. 

The response r[t] to this stimulus (middle panel)) exhibits both linear and nonlinear 
aspects, as well as variability. According to the linear model of Eq. 0, the response should 
be a pure 8 Hz sinusoid, with amplitude and phase determined by H[8, 0.2]. Clearly, r[t] (C: 
blue) is modulated at 8 Hz, but it also contains nonlinear components. The (Discrete) Fourier 
Transform R[w] makes this explicit: In addition to a prominent 8 Hz component (in red), 
distortion products (in green) with frequencies of Hz (the "DC" or average of r[t] over t) and 
16 Hz are plainly visible. Given the stimulus composition, these distortion products betray 
the presence of 2'^'^-order, and possibly O^'^-order ("spontaneous" activity), nonlinearity (both 
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of which are even-order). With respect to the hnear plus DC description (red curve in inset 
panel), including the 16 Hz distortion product (black curve) better accounts for the sharpness 
and non-negative nature of the response. 

The remaining portion of the response looks like noise. It is the manifestation of the 
period-to-period response variability. In the Fourier Transform of the response, it takes the 
form of a shallow baseline of energy that extends over all frequencies. Note that the square- 
root of the response variance (i.e., the standard error), calculated via Eq. |H1 is similarly 
distributed over the components of R[w] (black curve). 

The existence of response components due to nonlinearity and variability does not nec- 
essarily imply that they interfere with the STRF measurement. Since the stimulus consists 
of a single spectrotemporal modulation frequency with a temporal component of 8 Hz, only 
the 8 Hz component of the response is correlated with the stimulus, and e in Eq. is zero. 
The only source of error is the portion of the nonlinearity and variability that happens to be 
manifest at 8 Hz. The reverse-correlation STRF (STRF^i/^) can be assembled from results of 
consecutive presentations of dynamic ripples with different single spectrotemporal modula- 
tion frequencies. It is important to note that, due to time constraints, these point-by-point 
measurements of H were restricted to two cross-sections, as indicated by the gray outlines in 
left panel (Bottom row ). The full H was then constructed from a normalized outer product 



of these cross-sections (|Depireux et al.L 120011 ) 



3.1.2 Temporally Orthogonal Ripple Combinations 

In contrast to the dynamic-ripple stimuli, the TORC stimuli (jKlein et al.L l200n[ ) can directly 
measure the entirety of the H, because each stimulus is used to measure multiple points 
at once. The stimuli are necessarily more complex, containing six spectrotemporal mod- 
ulation frequencies (Fourier components) each. However, no two Fourier components in a 
given stimulus share the same value of \w\ (they are temporally orthogonal; their temporal 
correlation is zero); therefore, each spectrotemporal modulation frequency in the stimulus 
will evoke a different temporal frequency in the linear part of the response. 

The dynamic spectrum of one TORC is shown in FigureEB (top panel). It is composed of 
six spectrotemporal modulation frequencies having the same Q of 0.2 cyc/oct, but different 
ly's spanning the range of 4 to 24 Hz. The associated response (inset in middle panel: 
blue) exhibits a complex modulation of the spike rate. The smoothed response, obtained by 
discarding the frequencies above 24 Hz, is superimposed in red. A more accurate view of 
the linear part of the response is also shown (in dashed black), which was obtained from the 
inverse-repeat procedure. It is very similar to the response predicted by STRF/)/? (Figure 
EK). The Fourier Transform of the response confirms the strong presence of the 4 to 24 Hz 
components (in red) expected from the linear model. However, with respect to the noise 
baseline, the response is weaker than it was for the above dynamic-ripple stimulus. 

In the reverse-correlation operation, the 4 Hz response component is orthogonal to all 
stimulus components besides the 4 Hz component, the 8 Hz response component is correlated 
only with the 8 Hz stimulus component, and so on; e is again zero. The STRF after all stimuli 
are presented, are shown in bottom panels. It bears a striking resemblance to STRF/^/?, 
despite the drop in both SNR and SNRcor- This indicates that estimates of the neuron's 
STRF are robust to changes in the spectrotemporal modulation frequency content of stimuli. 
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Figure 2: Measuring the STRF of one neuron with different types of ripple stimuh. A: (Top panel) Dynamic 
spectrum of a single dynamic-ripple stimulus with w = —8 Hz and — 0.2 cyc/oct. 90 stimulus periods were 
used. (Middle panel) Inset displays the response as time-dependent spike rate estimate, r[t]: Raw estimate 
(blue) (using At = 1 ms), linear (8 Hz) plus DC (0 Hz) approximation (red), and the approximation obtained 
by including the (even-order) 16 Hz distortion product (black). The magnitude of the Fourier Transform 
of response is shown in the panel, clearly exhibiting the linear 8 Hz component (red), nonlinear distortion 
products (green) , and the remaining noise component (blue) . Also shown is the square-root of the response 
variance (the standard error) as a function of frequency (black). (Bottom panels): Measurements of (left) 
H and (right) h (or STRF^jj^) after all 30 stimuli. The grey outhnes in H (left panel) indicate the cross- 
sections that were directly measured. B: (Top panel) Dynamic spectrum of a TORC with J7 = 0.2 cyc/oct 
and w's between 4 and 24 Hz. 75 stimulus periods were used. (Middle panel) Inset shows response as time- 
dependent spike rate estimate, r[t], after the inverse-repeat procedure: Raw estimate (blue), linear plus DC 
approximation (red) obtained by discarding frequencies above 24 Hz, and the response predicted from the 
previously obtained STRFui? in A (dashed black). (Bottom panels): Same as in A above using 15 pairs of 
TORC stimuli. C: (Top panel) Dynamic spectrum of a STWN with il's between 0.2 and 1.4 cyc/oct and w's 
between 4 and 24 Hz. 75 stimulus periods were used. (Middle panel) Inset shows response as time-dependent 
spike rate estimate, r[t\: Raw estimate (blue), the linear plus DC approximation (red), and the response 
predicted from STRFd/j (dashed black) and STRFtoac (dashed green) in A and B above. Panel also shows 
response Fourier Transform magnitude. The linear 4-24 Hz components (red) are barely distinct from the 
noise (blue). Also shown is the square-root of the response variance (the standard error) as a function of 
frequency (black). (Bottom Panels) The measurements of H and STRFst^vktv averaged over 30 stimuli. 



3.1.3 Spectrotemporally White Noise 

The spectrotemporally white noise (STWN) is the most complex of the ripple-based stimuli; 
its S contains all spectrotemporal modulation frequencies (Fourier components) with equal 
amplitudes and uniformly distributed phases. The typically poor efficacy of such stimuli can 
be improved somew hat by limit i ng the to a relevant range of spectral and temporal mod- 



ulation frequencies (jKlein et al. Figure (top panel) shows the dynamic spectrum 
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of one such stimulus, which contained all w's between 4 and 24 Hz and all f2's between 
and 1.4 cyc/oct. Although the response shown below (inset of middle panel) is quite a bit 
weaker than those observed in Figures |2l\ and B, when smoothed (red) it is still comparable 
to the linear predictions from both STRF^r (dashed black) and STRFTOijc (dashed green); 
this is despite the fact that the 4 to 24 Hz response frequencies predicted by the linear model 
are barely distinct over the noise baseline. 

This reverse-correlation scenario differs from that of the other two stimulus types. Each 
of the linear response frequencies is now the sum effect of multiple Fourier components of 
the stimulus sharing the same temporal modulation frequency. Every response frequency 
will in turn be correlated with each of the stimulus components sharing the same temporal 
modulation frequency. It is therefore not initially clear which stimulus component caused 
what component of the response; all points on the H corresponding to a given w cannot be 
distinguished. This ambiguity manifests itself in the form of a large e. 

Because e is dependent upon the (randomly assigned) phases of the S", it has an incoherent 
structure that is distributed over the entire measurement, and its strength can be reduced 
by averaging the results from multiple stimuli with differ ent phases (or by using more finely 
spaced ly's, i.e., increasing the base stimulus period T) ()Klein et al. . 2000). This argument 



also applies to the manifestations of variability and even-order nonlinearity (some odd-order 
distortion products are ho wever not dependent on the phases of the stimulus frequencies 
( Victor and Shaplevl . IToSoh ) . The result obtained after averaging the results from 30 different 



stimuli is shown in the bottom panels (approximately the same result would be obtained by 
extending T by a factor of 30). Despite a further decrease in SNR and SNRcor, its similarity 
to STRF£)ij (Figure EK) and STRFTOiJC (|2J3) is impressive; the STRF of the neuron has 
maintained its form for more than an hour, over vastly different stimulus types. 

3.1.4 Application of the Singular- Value Decomposition 

In this section, we demonstrate the use of the SVD for producing approximations of the 
measurements of the STRF and H. Such approximations represent an optimal trade-off 
between error r eduction and s i gnal loss , prov ided the errors are evenly distributed over the 
measurements (jstewart, 19931 : Hansen, 1998[ ). The proportion of signal lost is gauged by 



PsvD (see Methods). 

The SVD of the STRFto/jc from Figure|2l3 is illustrated again in FigureElA-. The singular 
values of the first 12 separable matrices from the SVD are shown (top row), along with the 
error-derived threshold (see Methods) indicated by the dashed line. The first singular value, 
corresponding to the separable rank 1 matrix (bottom row), towers over the others, and alone 
exceeds the threshold. The STRF is well described by this separable matrix, while the sum 
of the remaining separable matrices, consists of unstructured measurement errors. Indeed, 
PsvD = 4.8%, indicating that more than 95% of the STRF power is captured by this rank-1 
approximation. That is, in large part this STRF represents the product of independent 
spectral and temporal integration. 

In contrast, the SVD of a different neuron's STRFtoac is shown in Figures 133. This 
STRF does not look separable; for inputs at different tonotopic locations x, the temporal 
integration by the neuron (in its network) is not related by a simple scaling of the same 
function. In this case, the second singular value (top row panel) also protrudes above the 
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threshold, the rank-1 approximation (middle panels) fails to describe the STRF's oblique- 
ness, and PsvD is high at 28.2%. After including the second separable matrix (bottom row- 
panels), the approximation is vastly improved {Psvd = 6.7%), and the remainder again 
chiefly consists of unstructured errors. 



Original STRF 



125 
T (ms) 



Singular Values 




separable matrix # 



Rank 1 

Approximation Remainder 



PSVD^^.S' 



125 
T (ms) 



125 
T (ms) 



Singular Values 




separable matrix # 

Rank 1 

Approximation Remainder 



Psvd: 28.2% 



X (ms) 
I 

125 




Rank 2 

Approximation Remainder 



Psvd: 6.7% 



t 

125 
T (ms) 



t 



125 
T (ms) 



Figure 3: Approximating the STRF with the SVD. A: An STRF that looks separable. (Top row) The 
original measurement, and the corresponding singular values (bars) of the separable matrices of the SVD, 
and the error-derived threshold (dashed line). (Bottom row) The rank-1 approximation and the remainder. 
B: An STRF that does not look separable. (Top row) The original STRF and its corresponding singular 
values (bars) and threshold (dashed line). (Middle row) The rank-1 approximation and second-separable 
matrix (or remainder). (Bottom row) The rank-2 approximation, and the remainder. A common color scale 
is shared by all panels within A, and a different color scale is shared by all panels within B. 

The SVD can alternatively be applied to the H. While the SVD of the full H yields an 
approximation identical to that of the STRF, applying the SVD separately to each of the 
quadrants of the H will generally produce a different approximation. This procedure is of 
interest chiefly because previous studies (using dynamic- ripple stimulation) have suggested 
that the H^s of AI neurons a re well described as being quadrant-separable (jKowalski et al 



1996bl: Depireux et al. . 2001 ). implying that the SVD of each quadrant of the H should 



yield at most one separable matrix of significance. Therefore, if the STRF is not separable, 
it could be advantageous (in terms of error reduction) to approximate the STRF in this 
manner. This principle is examined in Figure |31 using the non-separable STRF from Figure 
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The SVD of each of the upper two quadrants of the H shown in EJ\ (middle panel) 
yields the two sets of singular values (bottom panel). In each quadrant, only the first 
singular value is pronounced and exceeds the threshold. This indication that the quadrants 
are indeed separable is supported upon comparison of the original STRF (top panel) with 
the quadrant-separable approximation (for which 13svd = 6.6%) and the remainder, shown 
in B. Intriguingly, the result is markedly similar to the rank-2 approximation of the STRF 
from Figure in|3. By implication, the H from the rank-2 approximation (shown in D) is very 
similar that from the quadrant-separable approximation (in C). The Fourier Transforms of 
the corresponding remainders are also very similar. 




Figure 4: Approximating the H with the SVD. A: (Top panel) The original STRF measurement and (Middle 
panel)the corresponding H magnitude (first two quadrants). (Bottom panel) The singular values (bars) and 
thresholds (dashed lines) of the first two quadrants of the H. B: The quadrant-separable approximation of 
the STRF in A and the remainder. C: The H magnitude from the quadrant-separable approximation, and 
the Fourier Transform of the remainder. D: The H magnitude from the rank-2 approximation (from Figure 
131), and the Fourier Transform of the remainder (from|3l A common color scale is shared by B and the top 
panel of A . A different color scale is shared by C, D, and the middle panel of A. 

In summary, we have demonstrated the use of the SVD for producing relatively error-free 
approximations of the STRF or H measurements. Later, in Section 13.41 we will examine 
how well these three types of approximations — the rank-1, rank-2, and quadrant-separable 
approximations — apply to the whole of the neuronal population, as a function of the error 
level and the type of stimulation. 



22 



3.2 Direct Comparisons of STRFs Measured with Different Stim- 
ulus Types 

In 45 out of 308 neurons whose STRFs we measured, we obtained multiple STRF mea- 
surements using two or all three stimulus types. The resemblance between the first 125 ms 
of each pair of measurements was quantified by the correlation coefficient (see Methods), 
which was computed under four conditions: for the raw (pre-SVD) measurements, and for 
the quadrant-separable, rank-2, and rank-1 approximations of the measurements. 

The correlation coefficients from the raw comparisons are plotted in Figure versus the 
limiting (minimum) SNRcor of the two measurements. The squares, triangles, and circles 
correspond to the three possible pairs of stimulus types compared. The trends followed by 
all stimulus comparisons are similar. When SNRcor is above 1, the correlation coefficients 
are high and are weakly affected by SNRcor- The correlation coefficients are only small when 
SNRcor is small; as SNRcor descends to 0, so do the correlation coefficients. This mirrors the 
relationship expected from two identical STRFs that are corrupted by independent errors, 
as indicated by the solid-black Curve 1. In other words, it the relationship produced when 
the STRFs of the system, summarized by the (error-free) STRF, is impervious to changes 
in stimulus type, but the STRF measurement is error-prone. 

Since the SVD approximations act to reduce errors, they should result in higher correla- 
tion coefficients, provided the STRF measurements have similar signal components. These 
properties are evident in the three dashed curves in EK, which summarize the correlation 
coefficients obtained from the quadrant-separable (Curve 2), rank-2 (Curve 3), and rank-1 
(Curve 4) approximations of each pair of measurements (the data points are not shown, for 
clarity). The curves fit the combined data from all three types of stimulus comparisons. The 
fits were produced by modeling the error reduction as a multiplicative gain g in SNRcor 
(see Methods). The values of g used for Curves 2-4 are 1.7, 1.9, and 2.9, respectively; these 
values minimized the number of data points deviating more than 0.1 units away from the 
curves (providing the most visually pleasing fits). 

For all data points exceeding the critical SNRcor = 1 level, FigureEB shows the complete 
range and the average of the correlation coefficients. Again, similar results are obtained no 
matter which two stimulus types are compared. For the raw measurements, correlation 
coefficients fall between 0.5 and 0.8, with an average of 0.64. The average rises to 0.73 
and 0.75 for the quadrant-separable and rank-2 approximations, respectively. For the rank- 
1 approximations, the correlation is 0.85 on average, is as high as 0.97, and does not fall 
below 0.74. The average correlations are still higher (0.71, 0.78, 0.80, and 0.88, respectively) 
when the comparisons are further restricted to the half-sized rectangular region containing 
the most power (e.g., the dashed box in the top row of EP), as indicated by the x's. Least 
affected are the rank-1 comparisons, suggesting that they are already relatively error free. 
Note that these values far surpass those typically produced by comparing the STRFs of 
different neurons; for example, if the rank-1 approximation of a neuron's STRF-pQ/jf; was 
compared to the rank-1 approximation of the subsequent neuron's STRF5'p^^, the average 
correlation was 0.03. 

Some visual comparisons of STRF measurements are available in Columns C through E 
of Figure El For each comparison, either the rank-1 or rank-2 approximations are shown, 
depending on what was optimal for the STRF with the highest SNRcor- In C are results 
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Figure 5: Similarity between STRFs measured with different stimulus types: Dynamic ripple (DR), TORC 
(TC) and STWN (WN). DR-TC, e.g., denotes comparisons between DR and TORC STRFs. Correlation 
coefficients were computed between the original (raw) measurements, and between the quadrant-separable 
(q-sep), rank-2 and rank-1 approximations of each measurement. A: Correlation coefficients plotted versus 
minimum SNRcor of the two original measurements. Squares, triangles, and circles correspond to the raw 
comparisons; different symbols correspond to the different pairs of stimulus types compared (see legend). 
Curve f (solid black) is the relationship expected from two identical STRFs with independent errors. Curves 
2, 3, and 4 (dashed curves) are fits to the correlation coefficients obtained from the quadrant-separable, 
rank-2^ and rank-1 approximations of each measurement, respectively (see text). B: The complete range 
(vertical lines) and the average (squares, triangles, and circles) of the correlation coefficients are shown for 
all comparisons where the minimum SNRcor was above 1. Also shown (black x's) are the average correlation 
coefficients, for all pairs of stimulus comparisons combined, obtained when the comparison is further limited 
to the half-sized rectangular region of the STRF containing the most power (see, e.g., dashed box on the 
top left STRF of Column C). Columns C, D, E: In each row, the STRF of the same neuron measured with 
different stimulus types are shown side by side. Shown are either the rank-1 or rank-2 approximations of 
the STRFs, depending on what was optimal for the measurement with the highest SNRcor- To the left are 
correlation coefficients obtained from each pair of comparisons and the SNRcor of the original measurements. 
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from three neurons that were tested with all three stimulus types. A typical rank-1 result is 
shown in the top row. The STRFs match in many details, including the suppressive areas 
and the multiple excitatory areas. In the middle row is a rank-2 example with somewhat 
lower-than-average correlation coefficients. While some features match well across stimuli, 
there is an increase in background fluctuations between STRF^iij and STRF^TVKAf that limits 
the comparisons. The rank-1 approximations may have been more appropriate here (and 
these yielded correlation coefficients over 0.8). In the bottom row is an unusual rank-2 
example, where the STRF peak shifts to a higher frequency, thus diminishing the correlation 
coefficients. However, SNRcor of the STRF^^vktv was only 0.8, so it is difficult to make 
definite claims about its structure. Results from additional neurons that were tested with 
two of the three stimulus types are provided in D and E. Overall, a wide variety of STRFs 
shapes, including unusual "offset" types (E, top row), are well preserved across stimulus 
type. To be sure, there is much less variation in STRF shape across stimulus type than 
there is across neurons. 

In summary, both visual and quantitative comparisons reveal a close resemblance between 
the STRFs measured with different stimulus types. The resemblance predictably increases 
as the limiting SNRcor of the measurements increases; similarly, using the SVD to reduce 
the error level only serves to increase their resemblance. The highest correlation coefficients 
result from the rank-1 approximations, indicating that they are the most error-tolerant. 
Similar results are obtained no matter which of the three possible pairs of stimulus types are 
compared. By the same token, a wide variety of STRFs are observed across neurons. 

Together, these observations indicate that linear spectrotemporal processing is a robust 
property of AI that takes diverse forms in individual neurons. 

3.3 The Sources and Stimulus Dependence of Measurement Error 

In Section 13.21 it was shown that the signal component of the STRF measurement, seen 
through the corrective lens of the SVD, is not crucially dependent on the stimulus type. 
Instead, the ability of the SVD to separate this signal from the measurement errors is crucially 
dependent on SNRcor, which may depend on the stimulus type. In this section, we examine 
the sources contributing to SNRcor and their stimulus dependence. 

3.3.1 Systematic Error 

The capacity of systematic errors to limit the quality of the measurements is evident in 
the relationship between SNR and SNRcor- This relationship, observed over all measure- 
ments for each stimulus type, is plotted in Figure 10 (with second-degree polynomial fits 
where appropriate). For both the TORC (A; F, Curve 1) and the STWN (D; F, Curve 4) 
measurements, SNRcor shows a clear saturating characteristic as SNR increases. Recall 
that SNRcor incorporates both the non-systematic and systematic errors, while the SNR 
incorporates only the non-systematic errors. Therefore, as the measurements become more 
reliable (SNR increases), the saturation of SNRcor evinces the systematic error that domi- 
nates when the non-systematic errors are sufficiently small. The relative significance of the 
systematic error component is revealed in the level to which SNRcor is limited in the high 
SNR measurements. 

Recall that for the TORC measurements (A; F, Curve 1), the inverse-repeat method was 
employed in order to remove systematic errors due to even-order nonlinearities. Therefore, 
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Figure 6: The relationship between SNR and SNRcor across all measurements for each stimulus type, 
and second-degree polynomial fits (black curves) when appropriate. The level of saturation of these curves 
indicates the relative levels of systematic errors in the measurements. A: TORC. B: TORC without inverse- 
repeat, thus retaining systematic errors due to even-order nonlinearities. C: TORC control, discarding half 
of the stimulus presentations, D: STWN. E: Dynamic ripple. F: Comparison of polynomial fits: Curves 1-4 
are from Figures A-D. Curve 5: STWN, discarding half of the stimuli, thus increasing systematic errors 
induced by the stimulus (e). Curve 6: STWN control, discarding half of the stimulus presentations. 

the saturation of SNRcor in the TORC measurements should be worsened if the inverse- 
repeat method is not used. Indeed, bypassing the inverse-repeat method did further hmit 
SNRcor (B; F, Curve 2), by a factor of about 2.5. Note that this is not simply a side effect 
of SNR reductions caused by discarding half of the data, for it is not observed if half of the 
stimulus presentations are discarded but inverse- repeat is still employed (C; F, Curve 3). 

In the STWN measurements (D; F, Curve 4), the systematic errors are much more severe 
than in the TORC measurements; the limiting value of SNRcor is at least 4 times lower, 
and so SNRcor is much less likely to exceed usable values. SNRcor is also less variable 
across the high-SNR measurements; when the measurements are reliable, which is fairly 
often, SNRcor reliably reaches its limited potential. This potential is further cut in half by 
discarding half of the stimuli (F, Curve 5), but not by discarding half of the presentations of 
each stimulus (Curve 6). In sum, these observations suggest that the errors are dominated 
by the nonideality of the STWN stimuli (i.e., e), to which all neurons were exposed. Our 
simulations also supported this view. Therefore, at least 4 times as many STWN stimuli 
would have to be used in order to raise the SNRcor potential to the level of the TORC 
method. 

Finally, note that the relationship between SNR and SNRcor is less clearly defined in 
the dynamic-ripple measurements (E) (although both SNRcor and SNR often surpass the 
values achieved by the other two stimulus types). In our experience, this is largely because 
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the e rrors are not uniformly distributed over the dynamic-ripple STRFs (jPepireux et al. . 



20011 ). due to the outer-product operation in the construction of the H. As a result, SNRc 



is a less reliable gauge of the overall error level in the dynamic-ripple measurements. 
3.3.2 Non-Systematic Error 

In Section 13.3.11 it was shown how the potential accuracy of the STRF measurements is 
limited by the level of systematic error, which depended on the stimulation method. However, 
if a method is to achieve a given level of accuracy within its potential, it is evident in Figure 
inithat the SNR (which reflects the level of non-systematic error) must be at least minimally 
adequate. In this section, we explore how the SNR is determined from the interplay between 
the stimulus, the STRF, and the neuronal response. 

To set the stage, recall from Eq. © that a single stimulus-response pair results in the 
measurement of a set of one or more points on H[w, Q], which is given by the spectrotemporal 
modulation frequencies content of the stimulus. By Eq. ® , the variance of each point {w, Q) 
is a fixed proportion, namely 1/a^, of the variance of the response's Fourier Transform at 
the corresponding (temporal) frequency w (a^ is the power of each of the spectrotemporal 
modulation frequencies in the stimulus). Now, consider the whole of the H measurement, 
built stimulus-by-stimulus. To simplify matters, we will first consider the situation in which 
every point of the measurement has resulted from a single stimulus-response pair — that is, 
prior to the TORC inverse-repeat procedure, the STWN phase-averaging procedure, or the 
dynamic-ripple outer-product operation. In that case, to find the variance of any point on 
the H, one needs only to find the variance of the appropriate response at the appropriate 
frequency, and weight it by 1/a^. Consequently, the average variance of the entire H (and 
STRF) measurement, (cr^), is simply 1/a^ times the average variance at all of the relevant 
frequencies of all responses. The SNR is then the ratio of P (the STRF signal power) to this 
number. 

What determines the variance of a response's Fourier Transform? Two observations lead 
to a simple answer. First, as middle panels in Figures |2K, |2P, and typify, the vari- 
ance of R[w] is nearly frequency-invariant (deviations from this c ould reflect refractoriness. 



burstiness, or oscillations in the response (jBair and KochL Il996[ )). Therefore, the average 



variance over the relevant frequencies is closely related to the average variance over all fre- 
Now, the average variance over all frequencies equals the average variance over 



quencies. iNow, tne average variance over ail irequencies equals tne average variance over 
all times ( Papouli^ . 19621 : lOppenheim and Schafeii . IT989[ ). which ties in the second obser- 



vation: The variance of r[t] is proportional to r[t]/n (where n is the number of stimulus 
periods). This originates from a linear relationship between the sample mean and the sam- 
ple variance of the binned spik e train responses (y[t]), which is a widely reported observation 
( Shadlen and Newsomd . 1998[ l. Consequently, the average variance over time is proportional 



to the average spike rate over time, f. So finally, all else being equal across stimuli, f (over 
all responses) can be treated as the lone variable determining the average variance of the 
responses over the relevant frequencies. The relationship observed across all STRF measure- 
ments is shown in Figure [7|A., where the variance has been transformed into the variance 
of a single response period by multiplying by n (thus correcting for differences in n across 
measurements). The trend across all neurons is indeed linear (on this log-log plot, the slopes 
of the linear fits to the data were very close to 1), and is only weakly influenced by stimulus 
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Fi gure 7: The sources and stimulus dependence of SNR, for the dynamic ripple stimuli (blue dots), 
TORCs (red circles), and STWN (black diamonds). A: The linear relationship between the average spike 
rate f and the average variance of the response's Fourier Transform, f is averaged over all responses. The 
variance is averaged over all responses, but only those temporal frequencies of each response relevant to the 
H measurement (where the corresponding S magnitude was nonzero, e.g., w = 4, 8, , 24 Hz). The variance is 
scaled by n (the number of stimulus periods) to correct for differences n across the measurements, and thus 
represents the variance of a single response period. B: (dashed lines) The expected relationships between f 
and (a^) (scaled by n) in the case where each point of the iJ's is obtained from a single stimulus-response 
pair. The actual relationships observed (plotted points) differ from the dashed lines by an amount predicted 
by the number of stimulus-response pairs whose results are averaged to obtain the final H (see text). C: The 
lower bound of f is proportional to the square-root of the STRF signal power P (the diagonal line's slope is 
1/2). The square-root law is expected from a linear-plus-rectification response model, but the scatter in f 
suggests additional sources of variability. 

In contrast, the choice of stimulus type effects order-of-magnitude differences in (due 
to differences in the number of spectrotemporal modulation frequencies per stimulus; recall 
Figure Q). This in turn strongly effects the STRF variance (cr^) for a given average spike 
rate f. Given the relationship observed between f and average response variance in[7|\, the 
predicted relationship between f and (a^) (again scaled by n) for each of stimulus type is 
indicated by the dashed lines in B. Note, however, that for a given neuron, the actual effect 
of stimulus type on (a^) depends on how f is also affected. Curiously, we have seen little 
evidence for a significant effect of stimulus type on f. From one type to the next, up to 
factor-of-two increases or reductions in f were typical, but this variation is not systematic 
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and is small compared that of a?. 

The actual relationship between the average spike rate and the STRF variance observed 
across all STRF measurements is indicated by the data points plotted in B. The discrepancies 
between these trends and the dashed lines, where they exist, are easily explained by the fact 
that every point of the actual H measurements is not the result of just one stimulus-response 
pair, as we have so far assumed. For the STWN stimuli, H[w, VL\ was the average result from 
30 stimulus-response pairs; therefore, its actual variance (cx^) (black diamonds) was lower 
than the black (upper-most) dashed line by a factor of 30. This largely compensated for 
the difference in a? between the STWN and TORC stimuli. Similarly, the inverse-repeat 
method effectively averages the results from two sets of stimuli, and so the (cr^) of the final 
TORC result (red circles), was cut in half with respect to the red (middle) dashed line. 
Finally, we observed that the (a^) of the final dynamic-ripple H (blue dots), each point of 
which results from the normalized product of two individual measurements, was typically 
similar to that of the measured cross-sections alone. Therefore, its relation to f was similar 
to the black (lower-most) dashed line, albeit with quite a bit of scatter. Overall, these 
properties conspired to produce SNRs that were, on average, a factor of 5 lower in the 
TORC measurements than in the dynamic-ripple measurements, and an additional factor of 
2 lower in the STWN measurements. 

For each stimulus type, the average spike rate f observed across neurons ranged over 
roughly two orders of magnitude. Figure[7p shows that the value of f is partially predicted by 
the STRF power P, in that f, and more strictly its lower bound, tends to grow by the square- 
root of P (the black line on this log-log plot has a slope of 1/2). A square- root relationship 
is expected from the linear response model followed by rectification: Generally speaking, 
STRFs (and if's) with higher magnitudes result in spike rates with proportionally stronger 
modulations, which, since the spike rate must be positive, result in proportionally higher 
f's; meanwhile, P grows as the square of the STRF magnitudes. Since f translates linearly 
into variance, this implies that STRFs with higher average power P, although associated 
with higher absolute levels of variability, have the potential to achieve higher SNRs; and this 
potential is realized in those neurons with the lowest f allowed for a given P. Note that the 
data from all stimulus types overlap, reinforcing the idea that f is not significantly affected 
by stimulus type. 

In summary, the ingredients of SNR are of two largely independent varieties: properties 
of the stimulus and properties of the auditory system. The key stimulus properties boil down 
to the power in each spectrotemporal modulation frequency a^, to which the SNR is inversely 
proportional, and the number of stimulus-response pairs used to measure each point of the 
H (including n, the number of periods of each stimulus), to which SNR is proportional. The 
system properties reduce to the STRF power P and the average spike rate f, to which the 
SNR is proportional and inversely proportional, respectively. Furthermore, f can be seen as 
the sum of two positive-valued components. One is proportional to the square-root of P, 
as predicted by a linear-plus-rectification response model. The other not obviously related 
to the STRF, and represents an additional source of variability that varies in strength from 
neuron to neuron. The net result is that an increase in P serves to increase the SNR, while, 
for a given P, an increase f counteracts this effect. 
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3.4 Sufficiency and Error Dependence of the SVD-Based Approx- 
imations 

In Section the SVD approximations of STRFs measured with different stimulus types 
were found to be highly similar when SNRcor (which reflects the level of measurement error) 
was adequate in both measurements. The stimulus dependence of SNRcor was then analyzed 
in detail in Section In this section, we further examine how the SVD approximations are 
influenced by SNRcor- Primarily, we are concerned with the extent to which measurement 
errors may prevent the SVD from resolving features of the "true" (error-free) STRF. 

For this purpose, it would be useful to know the proportion of the true STRF's power 
lost from an SVD approximation of the measurement. Unfortunately, in the presence of 
measurement error, this quantity is not precisely knowable. One way to estimate it is to 
compute the propor tion of the STRF meas urement's power lost from an SVD approximation. 



which we call asvD fiDepireux et al.l . l200lh . In total, we will consider a^y^i, ce^svD^ '^^svd^ 
which speak to the sufficiency of the rank-1, rank-2, and quadrant-separable approximations, 
respectively. One obvious disadvantage of asvD is that it is inflated in the presence of 
measurement errors (which comprise much of the measurement's lost power). This is evident 
in Figures |HK through C, where a^y^? (A); ^^s^vd (B), and a^y^ (C) are plotted versus 
SNRcor for all TORC and STWN STRFs (recall that SNRcor is unreliable for the dynamic- 
ripple STRFs). The influence of SNRcor on asvn clearly persists up to high SNRcor s. 

We reduced the dependence of asvD on the error level by removing the effect of the non- 
systematic errors (see Methods). The improved measure, Psvd is a more accurate gauge of 
the proportion of lost STRF power, especially when the systematic errors are small (e.g., in 
the TORC measurements). In theory, Psvd should be more tolerant than asvD to changes 
in SNR, and asvD should converge down to Psvd with increasing SNR. These properties 
are verified in Figures |HP through F, where Psvd (red circles) and asvD (back dots) are 
plotted versus SNR for the TORC measurements (the only caveat is that at very low SNRs, 
PsvD becomes unstable). It is concluded (with additional support from our simulations) 
that at moderate to high SNRs, the effect of non-systematic error is accurately removed 
in the computation of Psvd- Therefore, Psvd estimates the proportion of the systematic 
part of the STRF measurement relegated to the SVD remainder, and better reflects the true 
STRF's structure. To be conservative, we will consider f3svD only in those measurements 
with SNR^s over 1.5. 

The relationship between Psvd and SNRcor for the 82 TORC measurements meeting 
this criterion is plotted in Figures [Hp through I. The blue -|-'s and red x's denote the 50 
and 31 measurements optimally approximated by rank-1 and rank-2 matrices, respectively 
(the lone rank-3 approximation is not shown). At moderate to high SNRcor^s (e.g., above 
2), the PsvD distributions are only weakly dependent on SNRcor- In other words, the 
SVD approximations are only weakly affected by measurement errors, and therefore (3svd 
should more accurately reflect the structure of the true STRF. Therefore, the typical range 
of (^SVD (|Htj)? roughly from 3% to 25%, indicates many STRFs are poorly described by 
rank-1 approximations. It is reassuring that the lower and upper portions of this range are 
dominated by the measurements optimally approximated by rank-1 and rank-2 matrices, 
respectively. However, the boundary between the two populations progressively shifts from 
about 5% at the highest SNRcor to nearly 15% at the lowest SNRcor- This reflects the 
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Figure 8: Sufficiency of the SVD approxi- 
mations as a function of the error leveL A-C: 
otsvD, the proportion of the STRF measure- 
ment's power lost from the SVD approxima- 
tions, for all TORC (black o's) and STWN 
(red x's) measurements. D-F: Psvd (red 
o's) and asvD (back dots) versus SNR for 
all TORC measurements. Psvd estimates 
the proportion of the systematic part of the 
STRF measurement relegated to the SVD 
remainder, and therefore better reflects the 
true STRF's structure. At very low SNRs, 
PsvD is unstable (some points lay beyond 
the axis limits). G-I: Psvd versus SNRcor 
for all TORC measurements with SNR above 
1.5. Black +'s and red x's denote those mea- 
surements optimally approximated by rank-1 
(separable) and rank-2 (non-separable) matri- 
ces, respectively. With /S^y^ (G) typically as 
high as 25%, many STRFs are not well de- 
scribed by the rank-1 approximations. In con- 
trast, p'gyjj (H) and /Sgy"^ (I) are typically 
weh below 10%, indicating that all STRFs 
are well described by both the rank-2 and 
quadrant-separable approximations. The un- 
usually high Psvd's at the lowest SNRcor s 
indicates that the SVD is unable to resolve 
the structure of some non-separable STRFs 
with high error levels. J-L: asvD, computed 
as asvD but from the quadrant-separable (J, 
K) and the rank-2 (L) approximations of the 
TORC (black o's) and STWN (red x's) mea- 
surements. M-0: As expected, the asvD^s 
are well matched to the corresponding to 
PsvD^s in those TORC measurements with 
SNRcor above 2. 



fact that the optimal trade-off between error reduction and signal loss afforded by the SVD 
approximations gets worse as SNRcor decreases; at higher error levels, the true STRF must be 
further from being rank-l before the second separable matrix of the SVD becomes dominantly 
signal and is included in the approximation. 

Over this same range of suitably high SNRcor's, pfyj^ (H) and /^^y'^ (I) are universally 
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bound below 10%, with averages of 3.1% and 3.6%, respectively. That is, the true STRFs are 
almost completely contained within both the rank-2 and quadrant-separable approximations 
of TORC measurements with suitably low error levels. Indeed, as was illustrated in Section 
13.1.41 the two approximations were usually very similar. 

When SNRcor is low, a handful of measurements have conspicuously high values of Psvd 
(G), pfvD (H), or /35^'5 (I). There are three plausible reasons for this: (1) The systematic 
errors in these measurements are unusually large (thus inflating Psvd)] (2) The true STRFs 
are actually poorly described by these SVD approximations, and coincidentally the measure- 
ments have a high error level; (3) Because of the high error level, the SVD of these STRFs 
shapes is being disrupted, and more STRF power is being lost than otherwise would be. We 
favor the last reason, since (despite the error level) most of these STRFs appear to have 
non-separable shapes. Such STRFs are are also found at higher SNRcor s, but these high 

(2) (OS) 

values of Psvd ^'^d Psvd found at higher SNRcor's. 

Although they are needed to fully describe many STRFs, the trade-off to using the rank- 
2 or quadrant-separable approximations instead of the rank-l approximations is that they 
retain a higher proportion of the measurement error. This was earlier indicated in Figures 

and B. Similarly, for the these TORC measurements, we estimated (using the bootstrap 
method) that the SNR of the rank-l approximation is on average 3.4 ±0.6 times higher than 
that of the raw measurement, while for the rank-2 and quadrant-separable approximations, 
the average gain in SNR is reduced to 2.0 ± 0.6 and 1.9 ± 0.6, respectively. Note that 
these values are comparable to the SNRcor gain values g employed in Section 13.21 Although 
the rank-l approximations have higher SNRs, which means that they remove proportionally 
more noise than signal from the measurements, the proportion of signal removed (as gauged 
by Psvd) is unacceptably high for many STRFs. 

In order to cross-check the results obtained from Psvd, we recomputed asvD from the 
SVD approximations (denoted by asvo), rather than from the raw measurements. For ex- 
ample, if the quadrant-separable approximation is indeed a complete and relatively error-free 
version of the true STRF, then computing d^y^, and a^svD from it should yield results close 
to the corresponding (3g^yjj and (from the raw STRF measurement). Similarly, com- 

puting d^^^ from the the rank-2 approximation should yield a result close to /J^y'^. These 
cxsvd's are plotted in Figures |H1J through L versus SNRcor for both the TORC and STWN 
measurements. With respect to the original a^yo's in|HJ\ through C, they are more tolerant 
to changes in SNRcor over a wider range of SNRcor's. When SNRcor is above 2, these asvD^s 
are indeed closely matched to the corresponding Psvd's, as Figures [HM through O attest. 
When SNRcor drops below 1, the asvD^s rapidly increase and lose their correspondence with 
(3 SVD) presumably because the assumption that the SVD approximations are complete and 
error-free rapidly breaks down. 

In this section, we have concentrated on the TORC measurements. They are ideal in 
that they produced low levels of systematic error and a wide range of SNRcor s. The STWN 
measurements were less than ideal in that SNRcor was limited below 2. In Section 13.3.11 
this was found to be chiefly due to high levels of stimulus-induced systematic error; indeed 
PsvD was grossly inflated in these measurements, rendering it no more illuminating than 
a SVD (not shown). Nevertheless, over the range of SNRcor that they can be compared, 
the distributions of asvD in Figures IHIA. through C and dsvD in J through L were very 
similar for the STWN and TORC measurements. Moreover, from Section 13.21 the SVD 
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approximations of STWN and TORC measurements were increasingly well matched as the 
error level dropped. Therefore, the available evidence supports the hypothesis that, for 
a given level of measurement error, the STWN results and TORC results are equivalent, 
but the STWN results are much more error prone. The dynamic-ripple results were less 
than ideal in that STRF^i? is qua drant-separable by con struction. Additionally, it contains 



non-unifqrmly distributed errors (jPepireux et al.L l200lh . which complicates both the SVD 



(Stewart, Il993[ ) and the interpretation of SNRcor- With this caveat, we note that the 
distribution (although not the range) of /^ly^) was skewed toward somewhat higher values in 
the dynamic-ripple measurements. For instance, P^svd exceeded 10% in 61% of STRF^i/^'s 

(2) 

versus 45% of STRFtorc's. Still, P'svd 

was below 5% in 91% of STRF^/j's; the indications 
were that most STRFi:)/j's were still well described by rank-2 approximations. 

In summary, the optimal SVD approximation of an STRF measurement with a sufficiently 
low error level (e.g., SNRcor above 2) does well describe the STRF, in that it preserves at 
least 90% of the STRF's power. Therefore, we can be confident that if the SVD approx- 
imations of two STRF measurements are well matched, so are the corresponding STRFs. 
However, when there exist higher levels of measurement error, this is no longer guaranteed 
to be the case, particularly for STRFs that contain a significant non-separable component. 
Overall, around 60% of the TORC measurements were well described as being separable. 
The rest were better served by both rank-2 and quadrant-separable approximations, which 
were essentially identical. To the extent that they could be compared, the STWN and 
dynamic-ripple measurements produced similar results. 

4 Discussion 



A pseudo-random exploration of the space of spectrotemporal patterns, fostered by the 
traditional methodology of reverse correlation, has been the basis of most previous STRF 
measurements. Instead, we applied a deterministic and analytical reformulation of reverse 
correlation, which is based upon the Fourier-series description of dynamic spectra. One 
advantage of this approach concerns experimental optimization: It enables us to restrict 
the stimulus space to a minimal, discrete set of spectrotemporal patterns (the spectrotem- 
poral modulation frequencies, presented simultaneously or individually). It also facilitates 
our understanding of measurement errors and their various stimulus- and response-induced 
components. In sum, it enables us to design stimuli that are efficient and effective, while 
taking into account general knowledge of the STRF structure, response nonlinearity and 
variability, and specific laboratory constraints. A second advantage concerns experimental 
evaluation: Since any given dynamic spectrum can be described by its Fourier series, we can 
understand and quantify the performance of different stimulation methods, even if they were 
devised within different frameworks. Both of these advantages have been demonstrated in 
this study, where we have measured STRFs of AI neurons with three very different types of 
stimuli. 

We now discuss the major empirical results of this study. 
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4.1 Linearity 

The most striking finding is that when the STRF of an AI neuron is successfully measured 
with different types of stimuli, the results are very similar. The STRFs themselves exhibit 
a high degree of richness and diversity across neurons. The three types of stimuli used, 
Dynamic Ripples, TORCs, and STWN differ greatly in their spectrotemporal characteristics 
and statistics (c.f. Figures [TJ and top panels in |2K,B,C), and indeed they sound quite 
distinct from one another. Great differences even exist between stimuli of a given type 
(except for STWNs, which all sound noise-like). That STRFs measured from such widely 
different stimuli are so similar speaks to the significance and robustness of the linearity 
of neurons' responses with respect to the dynamic spectra of stimuli. Strong nonlinear 
system behavior would almost surely interfere with the STRF measurements, not allowing the 
STRFs generated from such different stimuli to have such large correlation coefficients (except 
trivial cases such as static nonlinearities, e.g., rectification). The correlation coefficients are 
especially large considering that the STRF measurements contain large low-power regions 
(error-prone even after the SVD), and furthermore compared measurements were often made 
over an hour apart. 

4.2 Efficacy of the stimuli 

Although, when successful, they lead to very similar STRF measurements, the three types 
of stimuli differ in their rates of success. Success is achieved when the STRF measurement 
contains sufficiently low levels of both non-systematic and systematic errors, reflected by the 
measures of SNR (using only non-systematic error) and SNRcor (including systematic error). 
Non-systematic errors, caused by response variability, are reduced when the modulations 
in the stimulus are more powerful (evoking stronger modulations in the response relative 
to the average spike rate), and also by averaging the results from stimuli with identical 
spectrotemporal statistics. Systematic errors, caused when multiple stimulus components 
evoke interfering response components (either linearly or nonlinearly) , are reduced by careful 
stimulus design, or by averaging the results from stimuli with identical spectrotemporal 
statistics (but different individual characteristics). Note that all of the stimulus types used 
had approximately the same total presentation duration. 

On balance, the stimuli that gave the best results were TORCs, which benefitted from 
careful stimulus design and relatively strong responses, due to the restricted number of spec- 
trotemporal modulation frequencies in each stimulus. As a result, we have noted that usable 
STRF measurements could have been obtained after presenting one sweep of each TORC 
stimulus (taking about 3 minutes), a fact that we intend to exploit in the future. STWN, 
while strongly motivated by the traditional reverse correlation methodology, gave STRFs 
with substantially more systematic error than TORCs. While both stimuli are capable of 
giving STRFs with high SNR, the STWN results in substantially poorer SNRcor- This is 
most cleanly seen by comparing figure panels ElA. andEP: both stimulus types give STRFs 
with SNR as high as 30, but STWN generated STRFs have SNRcor that saturate below 2, 
while TORC generated STRFs have SNRcor saturating at substantially higher values. 

Although the dynamic-ripple stimuli produce the most reliable results (highest SNR), 
they suffered a fundamental flaw: Too many stimuli were required to measure the full MTF^t- 
(and hence its STRF), and so measurements were restricted a subset of stimuli required if 
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the MTF5T is quadrant-separable. This is problematic for two main reasons. First, it makes 
it impossible to assess the quadrant-separability assumption directly. Although quadrant- 
separability holds in (ketamine-anesthetized) AI, there may be other neuronal populations 
or experimental conditions for which it doesn't. Second, the full MTF^t measurement is a 
more complex (nonlinear) function of the individual stimulus-response relationships. This 
complicates the evaluation of measurement errors, and thus blurs the distinction between 
neural functionality and methodological artifact. Indeed, the dynamic-ripple results had a 
few subtle idiosyncrasies, including more non-separable STRFs, and SNRcorS poorly corre- 
lated with other assays of measurement errors (SNR) and STRF structure {asvD, Psvd)- 
However, since the measurements are so reliable, it may be feasible to sacrifice some SNR by 
reducing the numbe r stimulus repetition s in order to present all stimuli required to directly 
measure the full H f Versnel et al. . 2002| ). 



Finally, we note that the TORC approach is not limited to the particular stimuli used 
in this study. Any combination of spectrotemporal modulation frequencies could exist in 
each stimulus, provided that they are temporally orthogonal. Therefore one can produce 
"super" TORCs, using fewer (but longe r-duration) stimuli , each of which contains many 



spectrotemporal modulation frequencies (jKlein et al.l . 120001 ). These stimuli are more noise 



like, but benefit from a lack of stimulus-induced systematic measurement errors in contrast 
to the STWN stimuli. We are currently investigating the effectiveness of such stimuli. 



4.3 The SVD: error reduction and signal loss 

In this paper, we used the SVD to reduce errors in the STRF measurements. The SVD 
is ideally suited for use with the STRFs measured here, because their SVD is strongly 
dominated by the lowest order terms; that is, they are well approximated by a small number 
of fully separable {rank-1) matrices. When such STRFs are perturbed by unstructured errors, 
the SVD is still strongly dominated by the lowest order terms, and has a well-understood 
contribution from higher order terms. The boundary between the low order (high signal, 
low error) and high order (low signal, high error) terms is not known a priori, but is well 
understood from signal detection theory. The upshot is that truncating the SVD series of 
an STRF at low order is an efficient and well-understood way of increasing SNR while 
minimizing loss of signal. 

Of the STRF measurements that were suitably error-free, more than half were not only 
optimally approximated but well approximated (as reflected by Psvd) by fully separable 
(rank-l) matrices. These approximations reduced the error power by at least a factor of 
3 while sacrificing less than a tenth of the signal power. The rest of the STRFs required 
two SVD terms {rank-2 approximations); using only one SVD term (rank-l) would give an 
incomplete view of the system functionality due to excessive signal loss. The rank-2 approx- 
imations have somewhat diminished error reduction, down to a factor of 2. Alternatively, 
the quadrants of the MTF5T could be approximated by fully separable matrices, producing 
results very similar to the rank-2 approximations. However, if the error level was too high 
(e.g., SNRcor below 1), the optimal SVD approximations no longer reliably achieved both 
significant error reduction and adequate signal retention. The error level should always be 
considered when interpreting the results of the SVD. 
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4.4 The SVD: functional implications 

It is intriguing that STRFs are equally well described by rank-2 and quadrant-separable 
approximations (see Figures |HH and I). These properties, each special in their own right, 
do not necessarily imply one another. It turns out that if an STRF is both rank-2 and 
quadrant-separable, special phase relationships must exist, in either the temporal or spec- 
tral dimensions (or both), between the separable mat rices of the SVD or equivalently the 
quadrants of the MTF^^. It has been demonstrated ({Simon et al. . subm) that AI STRFs 
possess this property in the temporal dimension (but not necessarily the spectral). This 
itself has strong theoretical implications for the network connectivity of those neurons. 



4.5 The error measures 

We found that incorporating systematic errors (otherwise known as bias) into our consider- 
ation of the total measurement error level is absolutely crucial for aligning the results from 
different types of stimuli, and thus understanding the structure of an STRF measurement 
(and the resulting SVD approximations, correlation coefficients, etc.) independent of stim- 
ulus type. We used and analyzed two different measures of error: SNR and SNRcor- SNR 
is the more classical but more limited of the two; SNR is the ratio of the measured STRF 
power to the measured STRF variance (square of the standard error). This definition of SNR 
(and its associated measure of error) is not able to incorporate systematic error, however. In 
contrast, SNRcor does incorporate systematic error. SNRcor is the ratio of measured STRF 
power to measured non-STRF power (e.g., the power in the spectrotemporal region where 
the underlying STRF is expected to have near- zero power). 

The only problem with SNRcor is that it requires assumptions about the structure of 
the errors and the STRF, which may not apply to all STRFs and stimuli. In particular, 
we assumed (based primarily on observations) that errors are evenly distributed over the 
measurements, and that the STRF power is near zero for r above 125 ms (using negative r's 
is no different since the stimuli were periodic). The usefulness and predictability of SNRcor 
demonstrated that these assumptions largely held for the TORC and STWN measurements. 
This was not the case for the dynamic-ripple measurements, however, likely due to a combi- 
nation of response nonlinearity and the nonlinearity of the STRF measurement itself, which 
distributes the errors non-uniformly in the spectrotemporal (and modulation frequency) do- 
main. It will be even more useful in the future to devise measures of the systematic errors 
that are less dependent on the structure of the STRF measurement. 



4.6 Response variability 

In our investigation of non-systematic errors in the STRF measurements, several observations 
concerning the variability of AI responses have interesting functional implications. For ex- 
ample, the fact that the response variance could be linearly predicted from the average spike 
rate in a nearly stimulus-independent manner points to a Poisson-lik e spike generation mech- 
anism , which has been vigorously investigated in the visual system ( Shadlen and Newsomd . 
Il998| ). Additionally, we found (see Figure [7p) that while neurons with higher-power STRFs 
(higher P) tended to fire more spikes (higher f, as might be expected from a linear-plus- 
rectification response model), a range of average spike rates were still observed for any given 
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STRF power. Neurons with the lowest spike rates (for a given P) corresponded to the 
highest-SNR STRFs, and had the sharpest, most phase-locked responses of the population 
(not shown). Neurons with the highest spike rates often had seemingly random responses 
and poor-quality STRF measurements. We will consider the origins and implications of such 
behavior more carefully in future studies. 



4.7 Related studies 

Other recent studies have also addressed the similarity of STRF meas urements with different 



types of stimuli, albeit in different auditory loci. Escabi and Schreiner (lEscabf and Schreinei 



2OO2I ) measured STRFs in cat inferior coUiculus (IC) with stochastic stimuli that in some 



respects resemble the dynamic-ripple stimuli and STWN used here. While their results 
largely agree with ours, they singled out a small group of neurons that either exhibited 
extremely selective and phase-locked responses to the dynamic-ripple-like stimuli but were 
unresponsive to the STNW-like stimuli (type-II neurons), or exhibited non-phase-locked 
nonlinear responses to both stimuli (type-Ill neurons). As discussed above, in AI we also find 
that neurons' responses can be extremely sparse and yet yield significant STRFs (like their 
type-II neurons). However, we did not observe two distinct populations of neurons; rather, 
the degree of phase locking in response to all stimuli ranged over a continuum. In addition, 
some AI neurons exhibited significant spike rates but poor STRF measurements (like their 
tj^e-III neurons). Although we have not yet found a nonlinear relationship between these 
responses and the dynamic spectra of the stimuli, we can no t yet rule out that possibility. 
In another study, Theunissen et. al. ( Theunissen et al. . 20o3) measured STRFs in the zebra 



finch auditory forebrain in response to random tone sequences and bird songs, and used 
the STRF from one stimulus to predict the responses to the other. They found small but 
significant differences in the cross-predictability of the responses, which was poor overall. 
These differences either reflect differences in the STRF-measurement method (which was 
implemented as a nonlinear function of the responses), or more probably reflect a higher 
degree of nonlinearity in the responses of neurons in the avian auditory forebrain with respect 
to mammalian AI (but see ( Schafer et al. . IT992| ). who reported a higher degree of linearity 
and predictability). 



4.8 Nonlinearity 

This article has been concerned with nonlinearities only insofar as they interfere with the 
STRF measurement, and methods were invoked to reduce this interference (e.g., the inverse- 
repeat method). Other methods are also available, such as more carefully choosing the 
temporal modulation frequencies in the TORC s, so that the nonliri ear distortion products 
are also orthogonal to the linear response (a la ( Victor and Shaplev . ll98Qi )). That is not so 



say that nonlinearities form an insigniflcant part of the AI response, merely that linearity 
is important, strong, and robust to changing stimulus conditions, and therefore forms an 
sturdy foundation upon which the study of auditory cortical processing can be based, even 
in its nonlinear aspects. 

We are currently investigating several anticipated nonlinearities. These include the non- 
linear transformation of responses occurs at the thalamo-cortical depressing synapse, which 
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contributes a rapid adaptation of onset responses towards a steady state within a fe w tens of 
milliseconds (|DenhamL Emi iKowalski et alllToOGailPhilliDs et all 120021 : lHeil . ll997t ) (we con- 
sidered only the steady-state response in this study). Additionally, we have observed that 
when stimuli contain both low and high modulation frequencies, AI responses ca n phase 
lock t o much higher frequencies than previously expected (e.g ., 100 — 200 Hz) fElhilali et al 



2004 ) . Similar effects have been observed in the visual system ( Bair and Koch , IQQgI: Reid et al. 



1992 : Chance et al. . 19981 ). In our stimuli, these high modulation frequencies result from in- 



teractions between unresolved AM tones (that fall within the bandwidth of the same cochlear 
filter), even though they were not part of the target dynamic spectrum (and therefore did 
not contribute to the STRF measurement). A third nonlinearity is the potential dependence 
of responses on the bandwidth of the stimulus. Broadband sustained stimuli (such as the 
ripples, TORCs, and STWN) likely bias cortical cells in a manner different from that of 
narrowband or transient stimuli such as tones and clicks. Consequently, pre dicting details of 



tone and click responses f rom t he STRF may prove sometimes problematic (|Kowalski et al 



1996bl : iTheunissen et all l2000f ). However, this nonlinearity is irrelevant when the focus is 



on comparing STRFs derived from similarly broadband and sustained stimuli, as is the case 
in this paper. Yet another important source of nonlinear effects are static nonlinearities 
(e.g., rectification, response saturation) with respect to stimulus level and contrast. By fix- 
ing stimulus contrast at near maximum (90 %), and the absolute le vel at an intermediate 
value (e.g., based on the rate-level function ( Kowalski et al. . 1996bl ) we have managed to 
obtain reliable reproducible results from a sizable proportion of cells in Al. Finally, there 
are fundamental nonlinearities that we have not yet convincingly observed in AI responses , 
such as units analogous to the complex cells of the visual cortex ( Valois and Valoil 1990l ). 
Nevertheless, it is likely that a significant proportion of the very low SNR STRFs observed in 
this study belong to cells that would be classified as nonlinear in that they either phase-lock 
poorly to our st imuli or respond to rnore co mplex patterns that we have not been able to 



probe (e.g., see ( Escabf and Schreiner . 2002| )) 
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